Hadoop Admin Resume New York

您所在的位置:网站首页 python hdf Hadoop Admin Resume New York

Hadoop Admin Resume New York

#Hadoop Admin Resume New York| 来源: 网络整理| 查看: 265

SUMMARY:

Over 7 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem. Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie. Strong understanding of various Hadoop services, MapReduce and YARN architecture. Responsible for writing Map Reduce programs. Experienced in importing-exporting data into HDFS using SQOOP. Experience loading data to Hive partitions and creating buckets in Hive. Developed Map Reduce jobs to automate transfer the data from HBase. Expertise in analysis using PIG, HIVEand MapReduce. Experienced in developing UDFs for Hive, PIG using Java. Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra. Scheduling all Hadoop/hive/Sqoop/HBase jobs using Oozie. Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud. Good understanding of Scrum methodologies, Test Driven Development and continuous integration. Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills. Experience in defining detailed application software test plans, including organization, participant, schedule, test and application coverage scope. Experience in gathering and defining functional and user interface requirements for software applications. Experience in real time analytics with Apache Spark (RDD, Data Frames and Streaming API). Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data. Experience in integrating Hadoop with Kafka. Expertise in uploading Click stream data from Kafka to HDFS. Expert in utilizing Kafka for messaging and publishing subscribe messaging system.

PROFESSIONAL EXPERIENCE:

Confidential, New York

Hadoop Admin

Responsibilities:

Worked on developing architecture document and proper guidelines Worked on installing Kafka on Virtual Machine. Created topic for different users Installed Zookeepers, brokers, schema registry, control Center on multiple machine. Setup ACL/SSL security for different users and assign users to multiple topics Develop security for users and they can connect with SSL security Assign access to users by multiple user’s login. Created documentation processes, server diagrams, preparing server requisition documents and upload them in Share point Used Puppet for automation of deployment to the server Monitor errors, warning on the server using Splunk. Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory. Created POC on AWS based on the service required by the project Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI Set up Hortonworks Infrastructure from configuring clusters to Node Installed Ambari server on the clouds Setup security using Kerberos and AD on Hortonworks clusters Managing the configuration of the cluster to the meet the needs of analysis whether me/O bound or CPU bound Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools. Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers Automated the setup of Hadoop Clusters and creation of Nodes Monitor the improvement of CPU utilization and maintain it. Performance tune and manage growth of the O/S, disk usage, and network traffic Responsible for building scalable distributed data solutions using Hadoop. Involved in loading data from LINUX file system to HDFS. Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings. Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration. Implemented test scripts to support test driven development and continuous integration. Optimization and Tuning the application Created User Guide Development and overviews for supporting teams Provide troubleshooting and best practices methodology for development teams. dis includes process automation and new application onboarding Design monitoring solutions and baseline statistics reporting to support the implementation Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka. Extremely good knowledge and experience with Map Reduce, Spark Streaming, Sparks for data processing and reporting. Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing. Implemented Spark using Scala and Sparks for faster testing and processing of data. Used Apache Kafka for importing real time network log data into HDFS. Developed business specific Custom UDF's in Hive, Pig. Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability. Optimized Map Reduce code by writing Pig Latin scripts. Import data from external table into HIVE by using load command Created table in hive and use static, dynamic partition for data slicing mechanism Working experience with monitoring cluster, identifying risks, establishing good practices to be followed in shared environment Good understanding on cluster configurations and resource management using YARN

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Java, Puppet, Apache Yarn, Pig, Spark, Tableau, Machine Learning.

Confidential, New York, New York

Hadoop Admin/Architect

Responsibilities:

Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop. Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform Set up Hortonworks Infrastructure from configuring clusters to Node Installed Ambari server on the clouds Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH Assign access to users by multiple users’ login. Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster. Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop. Extensively using Cloudera manager for managing multiple clusters with petabytes of data. Having knowledge on documenting processes, server diagrams, preparing server requisition documents Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory. Managing the configuration of the cluster to the meet the needs of analysis whether me/O bound or CPU bound Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools. Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers Performance tune and manage growth of the O/S, disk usage, and network traffic Responsible for building scalable distributed data solutions using Hadoop. Involved in loading data from LINUX file system to HDFS. Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings. Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration. Implemented test scripts to support test driven development and continuous integration. Worked on tuning the performance of MapReduce Jobs. Responsible to manage data coming from different sources. Load and transform large sets of structured, semi structured and unstructured data Experience in managing and reviewing Hadoop log files. Job management using Fair scheduler. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team. Using PIG predefined functions to convert the fixed width file to delimited file. Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files. Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python-MySQL connector MySQL dB package to retrieve information. Developed various algorithms for generating several data patterns.Used JIRA for bug tracking and issue tracking. Developed Python/Django application for Analytics aggregation and reporting. Used Django configuration to manage URLs and application parameters. Generated Python Django Forms to record data of online users Used Python and Django creating graphics, XML processing, data exchange and business logic Created Oozie workflows to run multiple MR, Hive and pig jobs. Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Develop Spark code using Scala and Spark-SQL for faster testing and data processing Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations. Import the data from different sources like HDFS/MYSQL into SparkRDD. Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, NLP (Natural Language Processing)

Confidential, New York, New York

Hadoop Admin

Responsibilities:

Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications. Installed application on AWS EC2 instances and also configured the storage on S3 buckets. Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template. Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch. Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management. Developed PIG scripts to transform the raw data into intelligent data as specified by business users. Worked in AWS environment for development and deployment of Custom Hadoop Applications. Worked closely with the data modelers to model the new incoming data sets. Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs. Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution. Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase. Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters. Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS. Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. . Import the data from different sources like HDFS/HBase into Spark RDD. Developed a data pipeline using Kafka and Storm to store data into HDFS. Performed real time analysis on the incoming data. Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing. Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

Confidential, New York, New York

SDET

Responsibilities:

Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process. Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets. Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver. Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations. Performed cross verification of trade entry between mainframe system, its web application and downstream system. Extensively used Selenium WebDriver API (XPath and CSS locators) to test the web application. Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG. Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files. Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system.\ Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry. Developed internal application using Angular.js and Node.js connecting to Oracle on the backend. Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java. Developed smoke automation test suite for regression test suite. Applied various testing technique in test cases to cover all business scenario for quality coverage. Interacted with development team to understand design flow, code review, discuss unit test plan. Executed tests in System & integration Regression testing In Testing environment. Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing. Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.

Environment: HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3